Skip to content

8351500: G1: NUMA migrations cause crashes in region allocation #3607

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

tstuefe
Copy link
Member

@tstuefe tstuefe commented May 22, 2025

This is not a clean backport. The effected G1Allocator and G1Collector methods have changed since JDK17.

So this backports reimplements the patch in a minimally invasive way while retaining as much similarity as possible with the original patch.

The gist of the patch is clear: instead of finding out the NUMA node index at every instance of G1Allocator::allocate_xxx, and then be subject to NUMA node migrations, we fix the NUMA node index once and use that one.

I tested this patch with my "FakeNUMA" addition (I plan to upstream that one at some point). This FakeNUMA mode mimics a lot of NUMA node migrations. I can verify that without this patch the JVM crashes quickly, with the patch it does not crash.


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • JDK-8351500 needs maintainer approval

Issue

  • JDK-8351500: G1: NUMA migrations cause crashes in region allocation (Bug - P3 - Requested)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk17u-dev.git pull/3607/head:pull/3607
$ git checkout pull/3607

Update a local copy of the PR:
$ git checkout pull/3607
$ git pull https://git.openjdk.org/jdk17u-dev.git pull/3607/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 3607

View PR using the GUI difftool:
$ git pr show -t 3607

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk17u-dev/pull/3607.diff

Using Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented May 22, 2025

👋 Welcome back stuefe! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented May 22, 2025

❗ This change is not yet ready to be integrated.
See the Progress checklist in the description for automated requirements.

@openjdk openjdk bot changed the title Backport 37ec796255ae857588a5c7e0d572407dd81cbec9 8351500: G1: NUMA migrations cause crashes in region allocation May 22, 2025
@openjdk
Copy link

openjdk bot commented May 22, 2025

This backport pull request has now been updated with issue from the original commit.

@openjdk openjdk bot added the backport label May 22, 2025
@tstuefe tstuefe marked this pull request as ready for review May 22, 2025 16:56
@openjdk openjdk bot added the rfr Pull request is ready for review label May 22, 2025
@tstuefe
Copy link
Member Author

tstuefe commented May 22, 2025

/approval request This is a solution for a quite hairy problem that hits customers on NUMA machines. It is rare, intermittent, and quite difficult to pinpoint. I therefore would appreciate it if I got approval for backporting.

Risk: lowish. The mechanism is quite clear, the patch, albeit unclean, simple in essence. I also took care to test its functioning well (see JBS description for details.)

@mlbridge
Copy link

mlbridge bot commented May 22, 2025

Webrevs

@openjdk
Copy link

openjdk bot commented May 22, 2025

@tstuefe
8351500: The approval request has been created successfully.

@openjdk openjdk bot added the approval label May 22, 2025
@tstuefe
Copy link
Member Author

tstuefe commented May 22, 2025

Ping @sjohanss - could I get a review for this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approval backport rfr Pull request is ready for review
Development

Successfully merging this pull request may close these issues.

1 participant